For a wireless avionics communication system, a Multi-arm bandit game ismathematically formulated, which includes channel states, strategies, andrewards. The simple case includes only two agents sharing the spectrum which isfully studied in terms of maximizing the cumulative reward over a finite timehorizon. An Upper Confidence Bound (UCB) algorithm is used to achieve theoptimal solutions for the stochastic Multi-Arm Bandit (MAB) problem. Also, theMAB problem can also be solved from the Markov game framework perspective.Meanwhile, Thompson Sampling (TS) is also used as benchmark to evaluate theproposed approach performance. Numerical results are also provided regardingminimizing the expectation of the regret and choosing the best parameter forthe upper confidence bound.
展开▼